Goto

Collaborating Authors

 pddl problem


Grounded Vision-Language Interpreter for Integrated Task and Motion Planning

arXiv.org Artificial Intelligence

While recent advances in vision-language models have accelerated the development of language-guided robot planners, their black-box nature often lacks safety guarantees and interpretability crucial for real-world deployment. Conversely, classical symbolic planners offer rigorous safety verification but require significant expert knowledge for setup. To bridge the current gap, this paper proposes ViLaIn-TAMP, a hybrid planning framework for enabling verifiable, interpretable, and autonomous robot behaviors. ViLaIn-TAMP comprises three main components: (1) a Vision-Language Interpreter (ViLaIn) adapted from previous work that converts multimodal inputs into structured problem specifications, (2) a modular Task and Motion Planning (TAMP) system that grounds these specifications in actionable trajectory sequences through symbolic and geometric constraint reasoning, and (3) a corrective planning (CP) module which receives concrete feedback on failed solution attempts and feed them with constraints back to ViLaIn to refine the specification. We design challenging manipulation tasks in a cooking domain and evaluate our framework. Experimental results demonstrate that ViLaIn-TAMP outperforms a VLM-as-a-planner baseline by 18% in mean success rate, and that adding the CP module boosts mean success rate by 32%.


Toward PDDL Planning Copilot

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly being used as autonomous agents capable of performing complicated tasks. However, they lack the ability to perform reliable long-horizon planning on their own. This paper bridges this gap by introducing the Planning Copilot, a chatbot that integrates multiple planning tools and allows users to invoke them through instructions in natural language. The Planning Copilot leverages the Model Context Protocol (MCP), a recently developed standard for connecting LLMs with external tools and systems. This approach allows using any LLM that supports MCP without domain-specific fine-tuning. Our Planning Copilot supports common planning tasks such as checking the syntax of planning problems, selecting an appropriate planner, calling it, validating the plan it generates, and simulating their execution. We empirically evaluate the ability of our Planning Copilot to perform these tasks using three open-source LLMs. The results show that the Planning Copilot highly outperforms using the same LLMs without the planning tools. We also conducted a limited qualitative comparison of our tool against Chat GPT-5, a very recent commercial LLM. Our results shows that our Planning Copilot significantly outperforms GPT-5 despite relying on a much smaller LLM. This suggests dedicated planning tools may be an effective way to enable LLMs to perform planning tasks.


Language Models For Generalised PDDL Planning: Synthesising Sound and Programmatic Policies

arXiv.org Artificial Intelligence

We study the usage of language models (LMs) for planning over world models specified in the Planning Domain Definition Language (PDDL). We prompt LMs to generate Python programs that serve as generalised policies for solving PDDL problems from a given domain. Notably, our approach synthesises policies that are provably sound relative to the PDDL domain without reliance on external verifiers. We conduct experiments on competition benchmarks which show that our policies can solve more PDDL problems than PDDL planners and recent LM approaches within a fixed time and memory constraint. Our approach manifests in the LMPlan planner which can solve planning problems with several hundreds of relevant objects. Surprisingly, we observe that LMs used in our framework sometimes plan more effectively over PDDL problems written in meaningless symbols in place of natural language; e.g. rewriting (at dog kitchen) as (p2 o1 o3). This finding challenges hypotheses that LMs reason over word semantics and memorise solutions from its training corpus, and is worth further exploration.


L3M+P: Lifelong Planning with Large Language Models

arXiv.org Artificial Intelligence

By combining classical planning methods with large language models (LLMs), recent research such as LLM+P has enabled agents to plan for general tasks given in natural language. However, scaling these methods to general-purpose service robots remains challenging: (1) classical planning algorithms generally require a detailed and consistent specification of the environment, which is not always readily available; and (2) existing frameworks mainly focus on isolated planning tasks, whereas robots are often meant to serve in long-term continuous deployments, and therefore must maintain a dynamic memory of the environment which can be updated with multi-modal inputs and extracted as planning knowledge for future tasks. To address these two issues, this paper introduces L3M+P (Lifelong LLM+P), a framework that uses an external knowledge graph as a representation of the world state. The graph can be updated from multiple sources of information, including sensory input and natural language interactions with humans. L3M+P enforces rules for the expected format of the absolute world state graph to maintain consistency between graph updates. At planning time, given a natural language description of a task, L3M+P retrieves context from the knowledge graph and generates a problem definition for classical planners. Evaluated on household robot simulators and on a real-world service robot, L3M+P achieves significant improvement over baseline methods both on accurately registering natural language state changes and on correctly generating plans, thanks to the knowledge graph retrieval and verification.


Instruction-Augmented Long-Horizon Planning: Embedding Grounding Mechanisms in Embodied Mobile Manipulation

arXiv.org Artificial Intelligence

Enabling humanoid robots to perform long-horizon mobile manipulation planning in real-world environments based on embodied perception and comprehension abilities has been a longstanding challenge. With the recent rise of large language models (LLMs), there has been a notable increase in the development of LLM-based planners. These approaches either utilize human-provided textual representations of the real world or heavily depend on prompt engineering to extract such representations, lacking the capability to quantitatively understand the environment, such as determining the feasibility of manipulating objects. To address these limitations, we present the Instruction-Augmented Long-Horizon Planning (IALP) system, a novel framework that employs LLMs to generate feasible and optimal actions based on real-time sensor feedback, including grounded knowledge of the environment, in a closed-loop interaction. Distinct from prior works, our approach augments user instructions into PDDL problems by leveraging both the abstract reasoning capabilities of LLMs and grounding mechanisms. By conducting various real-world long-horizon tasks, each consisting of seven distinct manipulatory skills, our results demonstrate that the IALP system can efficiently solve these tasks with an average success rate exceeding 80%. Our proposed method can operate as a high-level planner, equipping robots with substantial autonomy in unstructured environments through the utilization of multi-modal sensor inputs.


An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs

arXiv.org Artificial Intelligence

In recent advancements, large language models (LLMs) have exhibited proficiency in code generation and chain-of-thought reasoning, laying the groundwork for tackling automatic formal planning tasks. This study evaluates the potential of LLMs to understand and generate Planning Domain Definition Language (PDDL), an essential representation in artificial intelligence planning. We conduct an extensive analysis across 20 distinct models spanning 7 major LLM families, both commercial and open-source. Our comprehensive evaluation sheds light on the zero-shot LLM capabilities of parsing, generating, and reasoning with PDDL. Our findings indicate that while some models demonstrate notable effectiveness in handling PDDL, others pose limitations in more complex scenarios requiring nuanced planning knowledge. These results highlight the promise and current limitations of LLMs in formal planning tasks, offering insights into their application and guiding future efforts in AI-driven planning paradigms.


Planning with affordances: Integrating learned affordance models and symbolic planning

arXiv.org Artificial Intelligence

Intelligent agents working in real-world environments must be able to learn about the environment and its capabilities which enable them to take actions to change to the state of the world to complete a complex multi-step task in a photorealistic environment. Learning about the environment is especially important to perform various multiple-step tasks without having to redefine an agent's action set for different tasks or environment settings. In our work, we augment an existing task and motion planning framework with learned affordance models of objects in the world to enable planning and executing multi-step tasks using learned models. Each task can be seen as changing the current state of the world to a given goal state. The affordance models provide us with what actions are possible and how to perform those actions in any given state. A symbolic planning algorithm uses this information and the starting and goal state to create a feasible plan to reach the desired goal state to complete a given task. We demonstrate our approach in a virtual 3D photorealistic environment, AI2-Thor, and evaluate it on real-world tasks. Our results show that our agent quickly learns how to interact with the environment and is well prepared to perform tasks such as "Moving an object out of the way to reach the desired location." In real-world environments, the ability to come up with multi-step plans for a particular task is an important skill for an intelligent agent. Moreover, it is equally important that the agent be able to interact with the environment to execute this plan. For example, for efficient navigation of an environment, the agent must generate multi-step plans, including navigating through different rooms while clearing out any objects blocking the path. However, the agent must also be able to interact with the environment correctly for actions such as opening the door or picking up an object that is blocking the way.


Planning with Vision-Language Models and a Use Case in Robot-Assisted Teaching

arXiv.org Artificial Intelligence

Automating the generation of Planning Domain Definition Language (PDDL) with Large Language Model (LLM) opens new research topic in AI planning, particularly for complex real-world tasks. This paper introduces Image2PDDL, a novel framework that leverages Vision-Language Models (VLMs) to automatically convert images of initial states and descriptions of goal states into PDDL problems. By providing a PDDL domain alongside visual inputs, Imasge2PDDL addresses key challenges in bridging perceptual understanding with symbolic planning, reducing the expertise required to create structured problem instances, and improving scalability across tasks of varying complexity. We evaluate the framework on various domains, including standard planning domains like blocksworld and sliding tile puzzles, using datasets with multiple difficulty levels. Performance is assessed on syntax correctness, ensuring grammar and executability, and content correctness, verifying accurate state representation in generated PDDL problems. The proposed approach demonstrates promising results across diverse task complexities, suggesting its potential for broader applications in AI planning. We will discuss a potential use case in robot-assisted teaching of students with Autism Spectrum Disorder.


Handling abort commands for household kitchen robots

arXiv.org Artificial Intelligence

The task of aborting commands is essentially a problem of planning, or rather replanning whose core value comes when A. Handling abort commands a robotic system is able to autonomously infer a fallback plan without a human in the loop. A robot enhanced with capabilities The key challenges of handling cancel commands have of handling abort commands will able to reconfigure been acknowledged by Haarland et al. [1]. These challenges and replan its actions so that it can leave its environment in a include: the complex relationship between a goal and its clean state represents a step towards a more robust solution, subgoals, highlighting a need for a recursive approach for given the fact that in the world of robotics malfunctions and them, while also taking into account the plans in progress, unresponsiveness are risks that can be mitigated by having a i.e. the actual state of the world. Handling such scenarios is fallback mechanism.


TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models

arXiv.org Artificial Intelligence

Classical planning formulations like the Planning Domain Definition Language (PDDL) admit action sequences guaranteed to achieve a goal state given an initial state if any are possible. However, reasoning problems defined in PDDL do not capture temporal aspects of action taking, for example that two agents in the domain can execute an action simultaneously if postconditions of each do not interfere with preconditions of the other. A human expert can decompose a goal into largely independent constituent parts and assign each agent to one of these subgoals to take advantage of simultaneous actions for faster execution of plan steps, each using only single agent planning. By contrast, large language models (LLMs) used for directly inferring plan steps do not guarantee execution success, but do leverage commonsense reasoning to assemble action sequences. We combine the strengths of classical planning and LLMs by approximating human intuitions for two-agent planning goal decomposition. We demonstrate that LLM-based goal decomposition leads to faster planning times than solving multi-agent PDDL problems directly while simultaneously achieving fewer plan execution steps than a single agent plan alone and preserving execution success. Additionally, we find that LLM-based approximations of subgoals can achieve similar multi-agent execution steps than those specified by human experts. Website and resources at https://glamor-usc.github.io/twostep